home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
FishMarket 1.0
/
FishMarket v1.0.iso
/
fishies
/
501-525
/
disk_503
/
pcq
/
a68k26.lzh
/
A68k.doc
< prev
next >
Wrap
Text File
|
1989-04-30
|
24KB
|
532 lines
A68k - a freely distributable assembler for the Amiga
by Charlie Gibbs
with special thanks to
Brian R. Anderson and Jeff Lydiatt
(Version 2.41 - January 6, 1989)
Note: This program is Freely-Distributable, as opposed to Public
Domain. Permission is given to freely distribute this program provided no
fee is charged, and this documentation file is included with the program.
This assembler is based on Brian R. Anderson's 68000 cross-
assembler published in Dr. Dobb's Journal, April through June 1986.
I have converted it to produce AmigaDOS-format object modules, and
have made many enhancements, such as macros and INCLUDE files.
My first step was to convert the original Modula-2 code into C.
I did this for two reasons. First, I had access to a C compiler, but
not a Modula-2 compiler. Second, I like C better anyway.
The executable code generator code (GetObjectCode and MergeModes)
is essentially the same as in the original article, aside from its
translation into C. I have almost completely rewritten the remainder
of the code, however, in order to remove restrictions, add enhancements,
and adapt it to the AmigaDOS environment. Since the only reference book
available to me was the AmigaDOS Developer's Manual (Bantam, February
1986), the assembler and the remainder of this document work in terms
of that book.
RESTRICTIONS
Let's get these out of the way first. There are a few things that I
have not yet implemented, and some outright bugs that would take too long
to correct for this version.
o The verification file (-v) option is not supported. Diagnostic
messages always appear on the console. They also appear in the
listing file, however (see extensions below). You can produce
an error file by redirecting console output to a file - the
line number counter and final summary are displayed on stderr
so you can still see what's happening.
o The file names in the INCLUDE directory list (-i) must be separated
by commas. The list may not be enclosed in quotes.
o Labels assigned by EQUR and REG directives are case-sensitive.
o The following directives are not supported, and will be flagged as
invalid op-codes:
OFFSET
NOPAGE
LLEN
PLEN
NOOBJ
FAIL
FORMAT
NOFORMAT
MASK2
I feel that NOPAGE, LLEN, and PLEN should not be defined within a
source module. It doesn't make sense to me to have to change your
program just because you want to print your listings on different
paper. The command-line option "-p" (see below) can be used as a
replacement for PLEN; setting it to a high value (like 32767) is a
good substitute for NOPAGE. The effect of LLEN can be obtained
by running the listing file through an appropriate filter.
EXTENSIONS
Now for the good stuff:
o Labels can be any length that will fit onto one source line
(currently 127 bytes maximum). Since labels are stored on the
heap, the number of labels that can be processed is limited only
by available memory.
o Since section data and user macro definitions are stored in the
symbol table (see above), they too are limited only by available
memory. (Actually, there is a hard-coded limit of 32767 sections,
but I doubt anyone will run into that one.)
o The only values a label cannot take are the register names - the
assembler can distinguish between the same name used as a label,
instruction name or directive, macro name, or section name.
o Section and user macro names appear in the symbol table dump, and
will also be cross-referenced. Their names can be the same as any
label (see above); the assembler can sort them out.
o INCLUDEs and macro calls can be nested indefinitely, limited only
by available memory. The message "Secondary heap overflow -
assembly terminated" will be displayed if memory is exhausted.
You can increase the size of this heap using the -w parameter
(see below). Recursive macros are supported; recursive INCLUDEs
will, of course, result in a loop that will be broken only when
the heap overflows.
o The EVEN directive forces alignment on a word (2-byte) boundary.
It does the same thing as CNOP 0,2.
(This one is left over from the original code.)
o Branch (Bcc) instructions to a previously-defined label will be
automatically converted to short form if possible. This feature is
not available for forward branches, since in pass 1 the assembler
doesn't yet know how far the branch must go. You can, however,
ask A68k to tell you which instructions can be coded as short
branches by using the -f command-line switch (see below).
o Backward references to labels within the current CODE section
will be converted to PC relative addressing with displacement
if this mode is legal for the instruction.
o If a MOVEM instruction only specifies one register, it is converted
to the corresponding MOVE instruction. Instructions such as
MOVEM D0-D0,label will not be converted, however.
o ADD, SUB, and MOVE instructions will be converted to ADDQ, SUBQ,
and MOVEQ respectively if possible. Instructions coded explicitly
as (for example) ADDA or ADDI will not be converted.
o ADD, CMP, SUB, and MOVE to an address register are converted to
ADDA, CMPA, SUBA, and MOVEA respectively, unless (for ADD, SUB,
or MOVE) they have already been converted to quick form.
o ADD, AND, CMP, EOR, OR, and SUB of an immediate value are converted
to ADDI, ANDI, CMPI, EORI, ORI, and SUBI respectively (unless the
address register or quick conversion above has already been done).
o If both operands of a CMP instruction are postincrement mode, the
instruction is converted to CMPM.
o Operands of the form 0(An) will be treated as (An) except for
the MOVEP instruction, which always requires a displacement.
o The SECTION directive allows a third parameter. This can be
specified as either CHIP or FAST (upper- or lower-case). If this
parameter is present, the hunk will be written with the MEMF_CHIP
or MEMF_FAST bit set. This allows you to produce "pre-ATOMized"
object modules.
o The synonyms DATA and BSS are accepted for SECTION directives
starting data or BSS hunks. The CHIP and FAST options mentioned
above can also be used, e.g. BSS name,CHIP.
o The following synonyms have been implemented for compatibility
with the Aztec assembler:
CSEG is treated the same as CODE or SECTION name,CODE
DSEG is treated the same as DATA or SECTION name,DATA
PUBLIC is treated as either XDEF or XREF, depending on
whether or not the symbol in question has been
defined in the current source module.
A single PUBLIC directive can name a mixture
internally- and externally-defined symbols.
o The ability to produce Motorola S-records is retained from the
original code. The -s option causes the assembler to produce
S-format instead of AmigaDOS format. Relocatable code cannot be
produced in this format.
o Error messages consist of three parts.
The position of the offending line is given as a line number
within the current module. If the line is within a macro expan-
sion or INCLUDE file, the position of the macro call or INCLUDE
statement in the outer module is given as well. This process
is repeated until the outermost source module is reached.
Next, the offending source line itself is listed.
Finally, the errors for that line are displayed. A flag
(^) is placed under the column where the error was detected.
o Named local labels are supported. These work the same as the
local labels supported by the Metacomco assembler (nnn$) but
can be formed in the same manner as normal labels, except that
the first character must be a backslash (\).
o The following synonyms have been implemented for compatibility
with the Assempro assembler:
ENDIF is treated the same as ENDC
= is treated the same as EQU
| is treated the same as ! (logical OR)
o Quotation marks (") can be used as string delimiters
as well as apostrophes ('). Any given string must begin
and end with the same delimiter. This allows such statements
as the following:
MOVEQ '"',D0
DC.B "This is Charlie's assembler."
Note that you can still define an apostrophe within a string
delimited by apostrophes if you double it, e.g.
MOVEQ '''',D0
DC.B 'This is Charlie''s assembler.'
o If any errors are found in the assembly, the object code file
will be scratched, unless you specified the -k (keep) flag
on the command line.
o The symbols .A68K, .a68k, .a68K, and .A68k are automatically
defined as SET symbols having absolute values of 1.
This enables a source program to determine whether it is
being assembled by this assembler, and is effectively
insensitive as to whether or not it is checked in upper case.
o A zeroth positional macro parameter (\0) is supported. It
is replaced by the length of the macro call (B, W, or L,
defaulting to W). For instance, given the macro:
moov MACRO
move.\0 \1,\2
ENDM
the macro call
moov.l d0,d1
would be expanded as
move.l d0,d1
o If an INCLUDE file doesn't generate any code and no listing
file is required (including suppression of the listing using
NOLIST), it won't be read again in pass 2. The statement
numbers will be bumped to keep in proper alignment. This
can really speed up assemblies that INCLUDE lots of EQUates.
o The ORG directive is supported. It works like RORG, except
that it takes the actual address to be jumped to, rather
than an offset from the start of the current section.
The given address must be in the current section.
As far as A68k is concerned, the only real difference
between ORG and RORG is that the ORG value must be
relocatable, while the RORG value must be absolute.
THE SMALL CODE / SMALL DATA MODEL
Version 2.4 implements a rudimentary small code/data model.
It consists of converting any data reference to one of the following
three addressing modes:
address register indirect with displacement (using A4)
(for references to the DATA or BSS section)
program counter indirect with displacement
(for references to the CODE section)
absolute word
(for absolute and 16-bit relocatable values)
These conversions do not take place unless a NEAR directive is
encountered. Any operands on the NEAR directive are ignored.
Conversion is done for all operands until a FAR directive is
encountered. NEAR and FAR directives can occur any number of
times, enabling conversion to be turned on and off at will.
Backward references which cannot be converted (e.g. external
labels declared as XREF) will remain as absolute long addressing.
All forward references are assumed to be convertible, since during
pass 1 A68k has no way of telling whether conversion is possible.
If conversion turns out to be impossible, invalid object code will
be generated - an error message ("Invalid forward reference") will
indicate when this occurs.
Although the small code/data model can greatly reduce the
size of assembled programs, several restrictions apply:
o Small code and small data models are active simultaneously.
You can't have one without the other, since during pass 1
A68k doesn't know whether forward references are to CODE
or to DATA/BSS.
o Programs can consist of a maximum of two sections,
one CODE, the other DATA or BSS. If you try to define
a third section, the message "Too many SECTIONs" will
be displayed. The NEAR directive is active only within
the CODE section.
o While the NEAR directive is active, external labels (XREF)
must be declared before they are used, CODE section references
must be with 32K of the current position (i.e. expressible as
PC-relative), and DATA/BSS section references must be in the
first 64K of the DATA/BSS section (i.e. expressible as
address register indirect with displacement). Any instructions
which do not satisfy these requirements cannot be detected in
pass 1, so A68k has no choice but to display an error message
in pass 2 ("Invalid forward reference") which in this case
indicates that invalid code has been generated. To properly
assemble such instructions, you can temporarily disable
conversion with a FAR directive, then resume afterwards
with another NEAR directive.
o Conversion cannot be done for references between modules.
All external references must be left as absolute long.
o A68k assumes that register A4 points to the start of the
DATA/BSS section plus 32768 bytes. A4 must be preloaded
with this value before executing any code converted by the
NEAR directive. One way to do this is to code the instruction
that loads the register prior to the NEAR directive. Another
way is to use a MOVE.L with immediate mode, which is never
converted. Here are examples of the two methods:
LEA data+32768,a4 NEAR
NEAR MOVE.L #data+32768,a4
<remainder of code> <remainder of code>
BSS BSS
data: data:
<data areas> <data areas>
END END
HOW TO USE A68k
The command-line syntax to run the assembler is as follows:
a68k <source file name>
[<object file name>]
[<listing file name>]
[-d]
[-e<equate file name>]
[-f]
[-h<header file name>]
[-i<INCLUDE directory list>]
[-k]
[-l<listing file name>]
[-o<object file>]
[-p<page depth>]
[-q[<quiet interval>]]
[-s]
[-t]
[-w[<hash table size>][,<secondary heap size>]]
[-x<listing file name>]
[-y]
[-z[<debug start line>][,<debug end line>]]
These options can be given in any order, and the source file name can
appear before all switches, after them, or anywhere in the middle.
Option values, if any, must immediately follow the keyword with
no intervening spaces.
If the -o keyword is omitted, the object file will be given a default
name. It is created by replacing all characters after the last period in
the source file name by "o". For example, if the source file name is
"myprog.asm", the object file name defaults to "myprog.o". A source name
of "my.new.prog.asm" produces a default object file name of "my.new.prog.o".
If the source file name does not contain a period, ".o" is appended to it
to produce the default object file name.
The default value for the listing file name is arrived at in the same
way as the object file name, except that ".lst" is appended instead of ".o".
If you don't specify this parameter, no listing file will be produced.
If you specify -x (see below), -l (with the default name) is assumed,
although you can still use this parameter if you wish.
The default value for the equate file name is arrived at in the same
way as the object file name, except that ".equ" is appended instead of ".o".
The INCLUDE directory list is a list of directory names separated by
commas. No embedded blanks are allowed. For example, the specification
-imylib,df1:another.lib
will cause INCLUDE files to be searched for first in the current directory,
then in "mylib", then in "df1:another.lib".
The -d keyword causes symbol table entries (hunk_symbol) to be written
to the object module for the use of symbolic debuggers.
The -f keyword causes any forward branches (Bcc, BRA, BSR) that
could be converted to short form to be flagged. A68k can't convert them
automatically because it doesn't know in pass 1 how far the branch will
be. This option tells you which instructions could be manually converted.
The -k keyword causes the object file to be kept if any errors were
found. Otherwise, it will be scratched if any errors occurred.
The -l keyword causes a listing file to be produced. If you want
the listing file to include a symbol table dump and cross-reference,
use the -x keyword instead (see below).
The -p keyword causes the page depth to be set to the specified value.
If omitted, a default of 60 lines (-p60) is assumed.
The -q keyword changes the interval at which A68k displays the
current line number (the default is every 10 lines, i.e. -q10). If
you specify -q0 or -q without a value, no line numbers will be displayed.
This will speed up assemblies slightly by reducing console I/O. If -q
is specified as a negative number (e.g. -q-10), line numbers will still
be displayed at the specified interval, but will be given as positions
within the current module (source, macro, or INCLUDE) rather than
as a total statement count - the module name will also be displayed.
The -s keyword, if specified, causes the object file to be written in
Motorola S-record format. If omitted, AmigaDOS format will be produced.
The default name for an S-record file has ".s" appended to the source name,
rather than ".o"; this can still be overridden with the -o keyword, though.
The -t keyword allows tabs in the source file to be passed through
to the listing file, rather than being expanded. In addition, tabs will
be generated in the listing file to skip from the object code to the
source statement, etc. This can greatly reduce the size of the listing
file, as well as making it quicker to produce. Do not use this option
if you will be displaying or listing the list file on a device which
does not assume a tab stop at every 8th position.
The -w keyword specifies the size of the fixed memory areas that
are allocated. The first parameter gives the number of entries that
the hash table will contain (defaulting to 2047). This should be enough
for all but the very largest programs. The assembly will not fail if
this value is too small, but may slow down as a result of A68k having
to search many long hash chains. I've heard that you should really
specify a prime number for this parameter, but I haven't gone into
hashing theory enough to know whether it's actually necessary.
The second parameter of the -w keyword specifies the size of the
secondary heap (defaulting to 1024 bytes, which should be enough
unless you use very deeply nested macros and/or INCLUDE files with long
path names).
You can specify either or both parameters. For example:
-w4093 secondary heap size remains at 1024 bytes
-w,2000 hash table size remains at 2047 entries
-w4093,2000 increases the size of both areas
If you're really tight for memory, and are assembling small modules,
you can use this keyword to shrink these areas below their default sizes.
At the end of an assembly, a message will be displayed giving the sizes
actually used, in the form of the -w command you would have to enter to
allocate that much space. This is primarily useful to see how much
secondary heap space was used.
NOTE: All other table storage (e.g. the actual symbol table) is
allocated as required (currently in 8K chunks).
The -x keyword works the same as -l, except that a symbol table
dump, including cross-reference information, will be added to the end
of the listing file.
The -y keyword causes hashing statistics to be displayed. First
the number of symbols in the table is given, followed by a breakdown
of hash chains by length. Chains with length zero denote unused hash
table entries. Ideally (i.e. if there were no collisions) there should
be as many chains with length 1 as there are symbols, and there should
be no chains of length 2 or greater. I added this option to help me
tune my hashing algorithm, but you can also use it to see whether you
should allocate a larger hash table (using the first parameter of the
-w option, see above).
The -z keyword is provided for debugging purposes. You can cause
the assembler to list a range of source lines, complete with line number
and current location counter value, during both passes. For example:
-z lists all source lines
-z100,200 lists lines 100 through 200
-z100 lists all lines starting at 100
-z,100 lists the first 100 lines
If you wish to override the default object and (optionally) listing
file names, you can omit the -o and -l keywords. The assembler interprets
the first three parameters without leading hyphens as the source, object,
and listing file names respectively. Anything over three file names is an
error, as is attempting to respecify a file name with the -o or -l keywords.
TECHNICAL INFORMATION
The actual symbol table entries (pointed to by the hash table,
colliding entries are linked together) are stored in 8K chunks which
are allocated as required. The first entry of each chunk is reserved
as a link to the next chunk (or NULL in the last chunk) - this makes
it easy to find all the chunks to free them when we're finished. All
symbol table entries are stored in pass 1. During pass 2, cross-reference
table entries are built in the same group of chunks, immediately following
the last symbol table entry. Additional chunks will continue to be
linked in if necessary.
Symbol names and macro text are stored in another series of linked
chunks. These chunks consist of a link pointer followed by strings
(terminated by nulls) laid end to end. Symbols are independent entries,
linked from the corresponding symbol table entry. Macros are stored as
consecutive strings, one per line - the end of the macro is indicated by
an ENDM statement. If a macro spans two chunks, the last line in the
original chunk is followed by a newline character to indicate that the
macro is continued in the next chunk.
Relocation information is built during pass 2 in yet another series
of linked chunks. If more than one chunk is needed to hold one section's
relocation information, all additional chunks are released at the end of
the section.
The secondary heap is built from both ends, and it grows and shrinks
according to how many macros and INCLUDE files are currently open. At
all times there will be at least one entry on the heap, for the original
source code file. The expression parser also uses the secondary heap to
store its working stacks - this space is freed as soon as an expression
has been evaluated.
The bottom of the heap holds the names of the source code file and
any macro or INCLUDE files that are currently open. The full path is
given. A null string is stored for user macros. Macro arguments are
stored by additional strings, one for each argument in the macro call line.
All strings are stored in minimum space, similar to the labels and user
macro text on the primary heap. File names are pointed to by the fixed
table entries (see below) - macro arguments are accessed by stepping past
the macro name to the desired argument, unless NARG would be exceeded.
The fixed portion of the heap is built down from the top. Each entry
occupies 16 bytes. Enough information is stored to return to the proper
position in the outer file once the current macro or INCLUDE file has been
completely processed.
The diagram below illustrates the layout of the secondary heap.
Heap2 + maxheap2 -----------> ___________________________
| |
| Input file table |
struct InFCtl *InF ---------> |___________________________|
| |
| Parser operator stack |
struct OpStack *Ops --------> |___________________________|
| |
| (unused space) |
struct TermStack *Term -----> |___________________________|
| |
| Parser term stack |
char *NextFNS --------------> |___________________________|
| |
| Input file name stack |
char *Heap2 ----------------> |___________________________|
The "high-water mark" for NextFNS is stored in char *High2,
and the "low-water mark" (to stretch a metaphor) for InF is stored
in struct InFCtl *LowInF. These figures are used only to determine
the maximum heap usage.
AND FINALLY...
Please send me any bug reports, flames, etc. I can be reached
on Mind Link (604/533-2312), at any Panorama (PAcific NORthwest AMiga
Association) meeting, or via Jeff Lydiatt or Larry Phillips.
(I don't have the time or money to live on Usenet or CompuServe, etc.)
Charlie Gibbs
2121 Rindall Avenue
Port Coquitlam, B.C. V3C 1T9